Predicting Rare Classes: Comparing Two-Phase Rule Induction to Cost-Sensitive Boosting

نویسندگان

  • Mahesh V. Joshi
  • Ramesh C. Agarwal
  • Vipin Kumar
چکیده

Learning good classifier models of rare events is a challenging task. On such problems, the recently proposed two-phase rule induction algorithm, PNrule, outperforms other non-meta methods of rule induction. Boosting is a strong meta-classifier approach, and has been shown to be adaptable to skewed class distributions. PNrule’s key feature is to identify the relevant false positives and to collectively remove them. In this paper, we qualitatively argue that this ability is not guaranteed by the boosting methodology. We simulate learning scenarios of varying difficulty to demonstrate that this fundamental qualitative difference in the two mechanisms results in existence of many scenarios in which PNrule achieves comparable or significantly better performance than AdaCost, a strong cost-sensitive boosting algorithm. Even a comparable performance by PNrule is desirable because it yields a more easily interpretable model over an ensemble of models generated by boosting. We also show similar supporting results on real-world and benchmark datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Boosting Trees for Cost-Sensitive Classifications

This paper explores two boosting techniques for cost-sensitive tree classiications in the situation where misclassiication costs change very often. Ideally, one would like to have only one induction, and use the induced model for diierent misclassiication costs. Thus, it demands robustness of the induced model against cost changes. Combining multiple trees gives robust predictions against this ...

متن کامل

Intrusion detection by integrating boosting genetic fuzzy classifier and data mining criteria for rule pre-screening

The purpose of the work described in this paper is to provide an intelligent intrusion detection system (IIDS) that uses two of the most popular data mining tasks, namely classification and association rules mining together for predicting different behaviors in networked computers. To achieve this, we propose a method based on iterative rule learning using a fuzzy rule-based genetic classifier....

متن کامل

Boosting Trees for Cost-Sensitive Classi cations

This paper explores two boosting techniques for cost-sensitive tree classi cations in the situation where misclassi cation costs change very often. Ideally, one would like to have only one induction, and use the induced model for di erent misclassi cation costs. Thus, it demands robustness of the induced model against cost changes. Combining multiple trees gives robust predictions against this ...

متن کامل

Boosting Cost-Sensitive Trees

This paper explores two techniques for boosting cost-sensitive trees. The two techniques diier in whether the misclassiication cost information is utilized during training. We demonstrate that each of these techniques is good at diierent aspects of cost-sensitive classiications. We also show that both techniques provide a means to overcome the weaknesses of their base cost-sensitive tree induct...

متن کامل

A New Formulation for Cost-Sensitive Two Group Support Vector Machine with Multiple Error Rate

Support vector machine (SVM) is a popular classification technique which classifies data using a max-margin separator hyperplane. The normal vector and bias of the mentioned hyperplane is determined by solving a quadratic model implies that SVM training confronts by an optimization problem. Among of the extensions of SVM, cost-sensitive scheme refers to a model with multiple costs which conside...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002